1/4/2022

About me

Purpose of labs

  • Facilitate learning theoretical concepts covered in lecture
  • Answer questions about homework
  • Review for midterms and the final
  • Implement statistical analyses from lecture in RStudio

Lab 1 Outline

  • Cover preliminary statistics and probability concepts
  • Intro to R and setup

Random Variable

  • A variable that takes on specific values with specific probabilities.
  • Measurable functions that map outcomes of a stochastic process to a measurable space.
  • Typically denoted by a capital letter (e.g., X, Y, Z).
  • Denote the outcome of a coin flip as Y
    • Y = 1 if Heads, else Y = 0
    • p(Y=1) = 0.5

Probability distribution

  • Probability mass function (PMF)
    • Assigns probabilities to individual values of a discrete random variable

Probablity distribution

  • Probability density function (PDF)
    • Similar to a PMF, but instead specifies the probability that a continuous variable takes on a range of values.




Normal Distribution Notation

X ~ N(\(\mu\),\(\sigma^2\))

Expected value and Variance

  • The expected value of a random variable Y is denoted as E(Y).
    • Probability weighted average of all possible values.
y <- c(70,80,85,90,100)
p.y <- c(0.18,0.34,0.35,0.11,0.02)

E.y <- sum(y*p.y)
E.y 
## [1] 81.45
  • Variance
    • A measure of how disperse all possible values of a random variable are from the expected value (i.e. population or sample mean)

Multivariate Distributions

PDF \(\rightarrow\) Joint Probability Density

  • For the random variables X and Y, the joint pdf characterizes the probability that each X and Y takes on a set of values.

Multivariate Distributions

Variance \(\rightarrow\) Covariance

  • Measure of how much two random variables vary together.
  • Formally, is the expected value of each random variable’s deviation from its respective mean.


    cov(X,Y) = E[(X - E[X])(Y - E[Y])]
  • Typically represented as a matrix.

Marginal Distribution

  • Probability distribution of an outcome for one random variable in the presence of all other outcomes for another random variable
    x1 x2 x3 x4 pY(yi)
    y1 0.125 0.0625 0.03125 0.03125 0.25
    y2 0.09375 0.1875 0.09375 0.09375 0.46875
    y3 0.28125 0 0 0 0.28125
    pX(xi) 0.5 0.25 0.125 0.125 1

Pearson correlation & Statistical independence

  • Pearson correlation coefficient \(\rho\) measures the linear relationship between two random variables
  • Two random variables are statistically independent if the realization of one does not affect the outcome of the other.

Likelihood

  • Throughout this course we are going to use statistical models (e.g., regression, ANOVA) to describe patterns of variability in random variables.
  • These models have parameters (e.g., mean of a sampling distribution)
  • A likelihood function is the joint probability of observed data as a function of parameters in a statistical model.

Before we get into setting up R, I wanted to share a few programming tips. 1. Google is your best friend. Getting good at programming isn't memorizing all the functions. Really its just know what keywords to type into Google to find snippets of code that you need. 2. There's more than one way to code! Some ways are more efficient than others, but we aren't doing computationally heavy analyses in this course so no need to worry about that. Code that you see from me is not dogma, so if you have another way of doing it and getting the correct output more power to you! 3. I encourage you all to learn more than one programming language, preferably Python. Not only is it good if you want to go into data science, but it also helps you think abstractly about approaching code. 4. And of course have fun! There will be pain, sweat, and tears. But that doesn't mean it cannot be fun Alright any questions?